Methods and systems for scalable deduplication

ABSTRACT

Methods, computer program products, computer systems, and the like are disclosed that provide for scalable deduplication in an efficient and effective manner. For example, such methods, computer program products, and computer systems can include receiving a data object at an assigned node, determining whether the data object includes a sub-data object, and processing the sub-data object. The assigned node is a node of a plurality of nodes of a cluster, where the data object includes a data segment, and a signature. The signature is generated based, at least in part, on data of the data segment. The processing includes sending the sub-data object to a remote node. The remote node is another node of the plurality of nodes of the cluster.

FIELD OF THE INVENTION

The present invention relates to deduplication systems and, moreparticularly, to methods and systems for scalable deduplication.

BACKGROUND

An ever-increasing reliance on information and computing systems thatproduce, process, distribute, and maintain such information in itsvarious forms, continues to put great demands on techniques forproviding data storage and access to that data storage. Businessorganizations can produce and retain large amounts of data. While datagrowth is not new, the pace of data growth has become more rapid, thelocation of data more dispersed, and linkages between data sets morecomplex. Data deduplication offers business organizations an opportunityto dramatically reduce an amount of storage required for data backupsand other forms of data storage and to more efficiently communicatebackup data to one or more backup storages sites.

SUMMARY

The present disclosure describes methods, computer program products,computer systems, and the like are disclosed that provide for scalablededuplication in an efficient and effective manner. Such methods,computer program products, and computer systems include receiving a dataobject at an assigned node, determining whether the data object includesa sub-data object, and processing the sub-data object. The assigned nodeis a node of a plurality of nodes of a cluster, where the data objectincludes a data segment, and a signature. The signature is generatedbased, at least in part, on data of the data segment. The processingincludes sending the sub-data object to a remote node. The remote nodeis another node of the plurality of nodes of the cluster.

In one embodiment, the data object includes a container, which, in turn,includes a deduplicated data store and a metadata store. Thededuplicated data store includes one or more data segments. The metadatastore includes metadata associated with the one or more data segments.Further, the one or more data segments can include a data segment, andthe metadata can include a signature identifying the data segment and alocation of the data segment in the deduplicated data store. Furtherstill, the signature can be a fingerprint, and the fingerprint can begenerated by performing a hash function on data of the data segment.

In one embodiment, the data object can comprise a container, and thesending the sub-data object to the remote node further includes sendingthe container to the remote node and sending a container reference tothe remote node. In such an embodiment, the container reference cancomprise a container identifier that identifies the container.

In another embodiment, the method can include receiving the sub-dataobject at the remote node, storing the container in a localdeduplication pool at the remote node, and storing the containerreference in a local reference database at the remote node. Further, incertain embodiments, the method can also include receiving a request fora fingerprint list from a client system, retrieving the fingerprint listfrom a catalog, and sending the fingerprint list to the client system.Further still, in certain embodiments, the method can further includereceiving a request for a location of the fingerprint list from theclient system, determining the location, and sending the location to theclient system. Certain embodiments implement the catalog as a singleinstance for the cluster.

In certain embodiments, the data object includes a container and acontainer reference, and the method further includes storing thecontainer in a local deduplication pool at the assigned node and storingthe container reference (that identifies the container) in a localreference database at the assigned node. In such embodiments, thecontainer can include a deduplicated data store and a metadata store.

The foregoing is a summary and thus contains, by necessity,simplifications, generalizations, and omissions of detail; consequentlythose skilled in the art will appreciate that the summary isillustrative only and is not intended to be in any way limiting. Otheraspects, inventive features, and advantages of the present disclosure,as defined solely by the claims, will become apparent in thenon-limiting detailed description set forth below.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of methods and systems such as those disclosed herein may bebetter understood, and its numerous objects, features, and advantagesmade apparent to those skilled in the art by referencing theaccompanying drawings.

FIG. 1 is a simplified block diagram illustrating an example ofcomponents of a scalable deduplication system, according to oneembodiment.

FIG. 2 is a simplified block diagram illustrating an example of certaincomponents of a scalable deduplication system in greater detail,according to one embodiment.

FIG. 3 is a simplified block diagram illustrating an example ofcomponents of a scalable deduplication system, in which data andmetadata stores are depicted in greater detail, according to oneembodiment.

FIG. 4 is a simplified block diagram illustrating an example ofcomponents of a scalable deduplication system, in which certaincommunicative couplings and changes therein are depicted, according toone embodiment.

FIG. 5 is a simplified block diagram illustrating an example ofcomponents of a scalable deduplication system, in which the relationshipbetween various backups and the nodes on which those backups are storedis depicted, according to one embodiment.

FIG. 6 is a flow diagram illustrating an example of a scalablededuplicated backup process implemented in a scalable deduplicationsystem, according to one embodiment.

FIG. 7 is a flow diagram illustrating an example of a assigned nodeselection process implemented in a scalable deduplication system,according to one embodiment.

FIG. 8 is a flow diagram illustrating an example of a fingerprint listretrieval process implemented in a scalable deduplication system,according to one embodiment.

FIG. 9 is a flow diagram illustrating an example of a reference requestprocess implemented in a scalable deduplication system, according to oneembodiment.

FIG. 10 is a flow diagram illustrating an example of a reference updateprocess implemented in a scalable deduplication system, according to oneembodiment.

FIG. 11 is a flow diagram illustrating an example of a fingerprintsearch process implemented in a scalable deduplication system, accordingto one embodiment.

FIG. 12 is a flow diagram illustrating an example of a data object saveprocess implemented in a scalable deduplication system, according to oneembodiment.

FIG. 13 is a flow diagram illustrating an example of a data object sendprocess implemented in a scalable deduplication system, according to oneembodiment.

FIG. 14 is a flow diagram illustrating an example of a assigned nodedata object storage process implemented in a scalable deduplicationsystem, according to one embodiment.

FIGS. 15A and 15B illustrate a flow diagram depicting an example of aassigned node sub-data object storage process implemented in a scalablededuplication system, according to one embodiment.

FIG. 16 is a flow diagram illustrating an example of a remote nodesub-data object storage process implemented in a scalable deduplicationsystem, according to one embodiment.

FIG. 17 is a simplified block diagram illustrating an example ofcomponents of a scalable backup deduplication system and its operation,according to one embodiment.

FIG. 18 is a simplified block diagram illustrating components of anexample computer system suitable for implementing embodiments of thepresent disclosure, according to one embodiment.

FIG. 19 is a simplified block diagram illustrating components of anexample computer system suitable for implementing embodiments of thepresent disclosure, according to one embodiment.

While the present disclosure is susceptible to various modifications andalternative forms, specific embodiments of the present disclosure areprovided as examples in the drawings and detailed description. It shouldbe understood that the drawings and detailed description are notintended to limit the present disclosure to the particular formdisclosed. Instead, the intention is to cover all modifications,equivalents, and alternatives falling within the spirit and scope of thepresent disclosure as defined by the appended claims.

DETAILED DESCRIPTION

The following is intended to provide a detailed description and examplesof the methods and systems of the disclosure, and should not be taken tobe limiting of any inventions described herein. Rather, any number ofvariations may fall within the scope of the disclosure, and as definedin the claims following the description.

While the methods and systems described herein are susceptible tovarious modifications and alternative forms, specific embodiments areprovided as examples in the drawings and detailed description. It shouldbe understood that the drawings and detailed description are notintended to limit such disclosure to the particular form disclosed.Instead, the intention is to cover all modifications, equivalents andalternatives falling within the spirit and scope of the appended claims.

Introduction

Methods and systems such as those described herein provide for improvedperformance in deduplication systems and, more particularly, forscalability in deduplicated backup systems. Such methods and systemsfacilitate such scalability by implementing a shared-nothingarchitecture that employs a single instance of the catalog thatmaintains information regarding one or more backup operations that havebeen performed using the nodes of the given cluster (though that catalogcan, in certain embodiments, be distributed), and localizes the storageof deduplicated data and its associated metadata. In so doing, such anarchitecture creates a local deduplication pool (and a local referencedatabase therefor) on a per-node basis. Such an architecture avoids anumber of problems that would otherwise result in a distributed storagearchitecture.

In the aforementioned distributed storage architecture, data would bestored on multiple nodes, as would metadata therefor. In such ascenario, finding a given portion of data is more complicated, as is thepotential for data and its metadata to become separated. The globalsignature indexing (whether centralized or distributed) comes with highmaintenance costs and significantly increased implementation andoperational complexity. Further, such systems can suffer from a lack ofdata locality, resulting in an inability to provide sufficientthroughput and failing to scale with the number of nodes in the system.

Further, in the case in which deduplication is limited to one node,operations such as adding a node, load-balancing, and nodefailover/failback also produces a number of problems. For example, suchoperations, where deduplication is limited to a single node, result in asignificant drop in the deduplication ratio, and inability to properlymanage resource usage on the node, and abnormal increases in the backupwindow (owing to the inability to dynamically adjust to changes such asthose just mentioned).

In implementing methods and systems such as those described herein, sucharchitectures provide a number of advantages. For example, if a new nodeis added in such an architecture, certain data sources need only beredirected to the new assigned node, which is helpful with respect toload-balancing of computational and storage resources. Similarly, if onenode experiences a heavy workload, one or more backup operations can bemoved from that node to another of the nodes in the cluster. Furtherstill if the given assigned node becomes full, the remainder of thedata/metadata from an ongoing backup can be forwarded to another of thecluster's nodes. And in case of node failover/failback, deduplicateddata and its metadata can simply be stored on another of the nodes ofthe cluster.

This can be accomplished, for example, by creating a local deduplicationpool and associated local reference database on a per-node basis,thereby avoiding an increase in the memory requirements of any givennode. In so doing, a data source's backup image is allowed to referencedata on more than one node. Such implementations also maintain adesirable deduplication ratio by allowing a given backup operation to bemoved from one node to another (or begun on a first assigned node, andcontinued on a second). Such support also provides the ability tosmoothly transition backup operations from one node to another, whichresults in a reduction in backup window fluctuation. Further, byfacilitating any node of a given cluster to be made the assigned nodefor a given backup operation, the fungibility of nodes in such ashared-nothing architecture reduces the architecture's complexity, asdoes the need to perform signature look up on only a single node. Thisalso allows for subsequent backup images from the same data source to bestored on the same node, thereby providing data affinity and improvingresource usage. Further still a load/space balance algorithm implementedin such an architecture is able to enjoy more flexibility by virtue ofthe aforementioned fungibility of nodes in the ability for a data sourceto store backup images at any one or a combination of such nodes.

Methods and systems such as those described herein are thus able toaddress a variety of situations. Such situations include node addition,load-balancing, node-full conditions, and node failover/failbackhandling. As noted, nodes can be added to or removed from a givencluster essentially at will, under the management of a cluster manager,such as a cluster management server. Such a cluster management servercan also provide load-balancing in an on-demand fashion. Nodesimplemented according to such methods and systems are able to hand overdata/metadata from an ongoing backup to another of the nodes in thecluster. The cluster management server can also handle failover andfailback of the cluster's nodes.

Such methods and systems also provide for error handling. In the case ofa failure or crash of an assigned node or a remote node (nodes of thecluster that are not presently the assigned node for the given backupoperation), an architecture according to embodiments such as thosedescribed herein is able to handle such scenarios in the manner of afailed backup operation. In the case of the failure of an assigned nodeor a remote node, any partially added reference information that is beenrecorded would be removed. Further, in the case of the failure of anassigned node, the data object sent by a given data source (e.g., aclient system) is not recorded in the catalog, and so there is anoverall reference check. Any reference and data object that has norelated catalog entries would be treated as garbage, and so would besubject to garbage collection on a periodic basis. As will beappreciated in light of present disclosure, such error scenarios areinfrequent, and thus the performance of such error handling isacceptable.

In addition to the aforementioned advantages, architectures according toembodiments such as those described herein provide a number of otherbenefits. Given the shared-nothing architecture the implementation oflocal reference databases and local deduplication pools provide, nocross-node data management is needed, and the performance of sucharchitectures scales with an increase in the number of nodes employed.Such a per-node approach also means that the management of thedata/metadata in question is simplified. As noted, this also means thatsignature look up is limited to a single node, which provides bothimproved performance and simpler operation, and means that nodes can betracked on a per-backup source basis. The data sources can thus easilybuild client-side signature lists for such look up. Such architecturesalso simplify the reference and write operations from the client-sideperspective because such client systems need only communicate with asingle node in order to perform signature look up and dataobject/reference write (with the assigned node simply delegating anyreferencing and writing to remote nodes, if needed).

Example Architectures Providing Scalable Deduplication

FIG. 1 is a simplified block diagram illustrating components of anexample scalable deduplication system (depicted in FIG. 1 as a scalablededuplication system 100), in which methods and systems of the presentdisclosure can be implemented. Deduplication system 100 includes anetwork 105 that communicatively couples one or more client systems110(1)-(N) (collectively, client systems 110), one or more storage nodes(or more simply, nodes; depicted in FIG. 1 as nodes 130(1)-(N), andreferred to collectively as nodes 130), and a cluster management server140. As depicted in FIG. 1 , cluster management server 140 providescluster management functionality for a cluster 150 that includes one ormore of nodes 130. Cluster management server 140 also supportsoperations associated with certain embodiments by way of communicationwith nodes 130, and in particular, one or more of nodes 130 that supporta catalog according to such embodiments (e.g., depicted in FIG. 1 as acatalog 160). Further, cluster management server 140 not only managesfailover operations, failback operations, and the like, clustermanagement server 140 monitors the cluster for status of its nodes andchanges thereto (e.g., the addition of nodes, the removal of nodes, andthe like). Cluster management server 140 is also responsible forchoosing the next assigned node, using an algorithm and/or certainmetrics to make such determinations.

While catalog 160 is depicted as being maintained at a single node (node130(1)), which simplifies the operation of such embodiments, such anarrangement need not strictly be the case (e.g., catalog 160 might, incertain embodiments, be split among two or more of nodes 130). Further,while catalog 160 is depicted as being maintained at node 130(1), suchan arrangement need also not be the case, and so, catalog 160 could bemaintained at any of nodes 130. In supporting such operations andarrangements, cluster management server 140 provides support for ascalable backup deduplication architecture according to methods andsystems such as those described herein, the features and advantages ofwhich are discussed subsequently.

It will be noted that the variable identifiers such as those used herein(e.g., “N”) are used to more simply designate the final element (e.g.,client system 110(N) or nodes 130(1)-(N)) of a series of related orsimilar elements (e.g., client systems or nodes). The repeated use ofsuch variable identifiers is not meant to imply a correlation betweenthe sizes of such series of elements, although such correlation mayexist. The use of such variable identifiers does not require that eachseries of elements has the same number of elements as another seriesdelimited by the same variable identifier (i.e., there need be nocorrelation between the number of client systems and the number ofnodes, nor is such correlation to be implied). Rather, in each instanceof use, the variable identified may hold the same or a different valuethan other instances of the same variable identifier.

As noted, cluster management server 140 is communicatively coupled toclient systems 110, and nodes 130 of cluster 150, via network 105.Cluster management server 140 can include one or more physical serversconfigured to perform a variety of tasks related to the management ofcluster 150, and the implementation of backup and deduplication servicesfor scalable deduplication system 100, such as managing a full orincremental backup for one of client systems 130. In the systemillustrated in FIG. 1 , cluster management server 140 is furtherconfigured to communicate with the nodes of one or more clusters undermanagement (e.g., nodes 130 of cluster 150) for purposes of storing fullor incremental backup images from client systems 110 in resourcescontrolled by cluster management server 140. Such communication can bevia network 105 or via a direct link between deduplication storageserver 140 and nodes 130. Information that can be provided bydeduplication storage server 140 can include a unique identificationassociated with each data stream provided by one of client systems 110to one or more of nodes 130. Cluster management server 140 can alsoprovide sequence number identification to identify sequential datatransmitted in each uniquely-identified data stream, and can alsoprovide identifying information for a given backup (although such taskscan also be managed by client systems 110 themselves). Nodes 130 canthen use such information to associate received data streams from clientsystems 110 in accord with various embodiments, as further discussedbelow.

Backup services can be implemented in deduplication system 100 as aclient-server application, with a server component (e.g., supported bycluster management server 140) and a client component (e.g., supportedby each of client systems 110) of the client-server backup application.A server component can be configured to communicate with a clientcomponent during a backup process. Certain functions of the backupservices can be performed by the client and server components, where thefunctions may be divided between the two components, or may be performedcompletely by one component or the other, depending on theimplementation of the backup application. For example, clustermanagement server 140 can be configured to perform tasks related tomanaging backup operations, including communicating with client systems110 to initiate backup tasks therefor, maintaining and managinginformation regarding the deduplicated backup data maintained at one ormore of nodes 130, and other information regarding backups of clientsystems 110, as well as managing or tracking resources storing backupimages for client systems 110 at one or more of notes 130. It will beappreciated that nodes 130 can include a number of storage units,logical and/or physical, and such alternatives and modifications areintended to come within the scope of this disclosure.

One or more of client systems 110 (also referred to herein as clientdevices) can be implemented using, for example, a desktop computer, alaptop computer, a workstation, a server, or the like. An example ofsuch computing devices is described subsequently, in connection withFIG. 18 . One or more of client systems 110 can be configured tocommunicate with cluster management server 140 via network 105, asnoted. An example of network 105, which can be used by client systems110 to access client management server 140 and nodes 130, is a localarea network (LAN) utilizing Ethernet, IEEE 802.11x, or some othercommunications protocol. Network 105 can also include a wide areanetwork (WAN) and/or a public network such as the Internet, for example.

FIG. 1 also illustrates client system 110(1) as including user data 170and metadata 180. Each of client systems 110 can store such information,and each of client systems 110 can store different user data 170 andmetadata 180 in storage local to the given one of client systems 110. Aswill be appreciated in light of the present disclosure, in fact, a widevariety of data, metadata, executable programs, and other suchinformation and software accessible by each of client systems 110 can bethe subject of such backup operations.

User data 170 can include various data that is generated and/or consumedby applications, users, and other entities associated with client system110(1). Moreover, user data 170, in the embodiment shown (as well asothers), can also include executable files, such as those used toimplement applications and operating systems, as well as files that areused or generated by such executable files. User data 170 can includefiles generated by user applications (e.g., word processing programs,email programs, graphics programs, a database application, or the like)executing on client system 110(1). Some of user data 170 may also betransferred to backup server 130 and/or deduplication storage server 140via a network 105 to be included in a deduplicated data store. Each ofclient systems 110 can send different user data 170 to backup server 130and/or deduplication storage server 140.

Metadata 180 can include, for example, information regarding user data170. Metadata 180 can be generated by client system 110(1), such asduring a backup process. Upon an entity (e.g., an application or humanuser) requesting that client system 110(1) add all or part of user data170 to a deduplicated data store (e.g., as part of a regularly scheduledfull or partial backup), client system 110(1) reads user data 170 andgenerates metadata 180 regarding user data 170, such as one or moreidentifiers (e.g., signatures, hashes, fingerprints, or other uniqueidentifiers) that identify different portions of user data 170. Clientsystem 110 can process and communicate metadata 180 as a list (e.g., alist of signatures) to one or more of nodes 130 and/or deduplicationstorage server 140. Metadata 180 can be used by client system 110(1),along with information received from one or more of nodes 130 and/ordeduplication storage server 140, to determine whether a portion of userdata 170 is a duplicate (and so need only be referenced), or is unique(not duplicative of the data already stored at one of nodes 130), andso, should be sent to the assigned node of notes 130, and added to thereference database and deduplication pool thereof, as further discussedbelow.

In the architecture depicted in FIG. 1 , nodes 130 variously storededuplicated data (e.g., in a local deduplicated data store (that is,local to each of nodes 130)), and its associated metadata (e.g., in alocal reference database (that is, local to each of nodes 130) and/or asmetadata in a metadata store of a container in which the aforementioneddata is stored). By breaking user data 170 into some number of pieces(subunits of data) and deduplicating those pieces of data whenperforming a backup operation on user data 170, the components ofscalable deduplication system 100 are able to transfer and store thedata of the resulting backup image more efficiently, as the result ofidentifying unique ones of such subunits of data (or conversely,duplicate ones of such subunits of data).

For example, deduplication storage server 140 can manage deduplicationservices such as may be provided by a combination of the functionalitiesof client systems 110 and nodes 130 that eliminate duplicate datacontent in a backup context. As noted, deduplication services helpreduce the amount of storage needed to store backup images of enterprisedata (e.g., user data 170) and the resources consumed in communicatingsuch backup images by providing a mechanism for storing a piece ofinformation (e.g., a subunit of data) only once. Thus, in a backupcontext, if a piece of information is stored in multiple locationswithin an enterprise's computing systems (e.g., in multiple locations ata given one of client systems 110 and/or at multiple ones of clientsystems 110), that piece of information will only be stored once in adeduplicated backup image. Also, if the piece of information does notchange between a first backup and a second backup, then that piece ofinformation need not (and in certain embodiments, will not) be storedduring the second backup, so long as that piece of information continuesto be stored in the deduplicated backup image stored at one or more ofnotes 130. Data deduplication can also be employed outside of the backupcontext, thereby reducing the amount of active storage occupied by filescontaining duplicate data (e.g., in their entirety, or in part).

As will be appreciated in light of the present disclosure, deduplicationservices can also be implemented in a scalable deduplication system 100as a client-server application (not shown), with a server component(e.g., residing on deduplication storage server 140) and a clientcomponent (e.g., residing on one or more of client systems 110) of theclient-server application. For example, during a backup process forstoring a backup of user data 170 in the local deduplicated data storesof one or more of nodes 130, a client component of the deduplicationservices can be configured to generate metadata 180 regarding user data170, such as one or more identifiers, or signatures, that can identifydifferent portions of user data 170, and to communicate metadata 180 toa server component. Certain functions of the deduplication services canbe performed by the client and server components, where the functionsmay be divided between the two components, or may be performedcompletely by one component or the other, depending on theimplementation of the backup application.

It will be appreciated that each of the foregoing components of scalablededuplication system 100, as well as alternatives and modificationsthereto, are discussed in further detail below. In this regard, it willbe appreciated that network storage can be implemented by any type ofcomputer-readable storage medium, including, but not limited to,internal or external hard disk drives (HDD), optical drives (e.g., CD-R,CD-RW, DVD-R, DVD-RW, and the like), flash memory drives (e.g., USBmemory sticks and the like), tape drives, removable storage in a robotor standalone drive, and the like. Alternatively, it will also beappreciated that, in light of the present disclosure, scalablededuplication system 100 and network 105 can include other componentssuch as routers, firewalls and the like that are not germane to thediscussion of the present disclosure and will not be discussed furtherherein. It will also be appreciated that other configurations arepossible.

FIG. 2 is a simplified block diagram illustrating an example of certaincomponents of a scalable deduplication system in greater detail,according to one embodiment, and so depicts a scalable deduplicationsystem 200. As before, scalable deduplication system 200 includes anumber of client systems (client systems 110), a cluster managementserver (cluster management server 140), and a number of nodes (nodes130), communicatively coupled to one another by way of a network(network 105). As noted earlier, cluster management server 140 canprovide functionality including cluster management (e.g., as by way of acluster manager 210) and deduplication management (e.g., as by way of adeduplication manager 215). It is to be understood that while clustermanager 210 and deduplication manager 215 are depicted in FIG. 2 asbeing separate from one another, their functionality can be integrated.

Also depicted in FIG. 2 is catalog 160, which is illustrated as beingmaintained at node 130(1). While thus illustrated, it is to beappreciated that catalog 160 can be distributed among more than one ofnotes 130, and can be distributed among storage units of node 130(1). Itis to be understood, however, that scalable deduplication system 200typically maintains only a single instance of catalog 160 as the activecatalog for the given cluster, at any one time (although, allowing forother instances of catalog 160 to exist as inactive instance, for thegiven cluster, which are therefore not available to the backup processesof scalable deduplication system 200).

Nodes 130 can maintain information such as a local reference database(e.g., examples of which are depicted in FIG. 2 as local referencedatabases 220(1)-(N), which are referred to in the aggregate as localreference databases 220) and a local deduplication pool (e.g., examplesof which are depicted in FIG. 2 as local deduplication pools 225(1)-(N),which are referred to in the aggregate as local deduplication pools225). Local reference databases 220 include information such as, forexample, information identifying the containers in the localdeduplication pool, information identifying the backup to which a givensubunit of data (e.g., data segment) belongs, a fingerprint or othersignature for that subunit of data, the location of that subunit of datain the corresponding one of local deduplication pools 225, and othersuch useful information. Further, a node's local reference database cannot only reference data segments stored in that node's localdeduplication pool, but can also reference data segments stored in othernodes' local deduplication pools (or the containers stored therein,e.g., as by way of container identifiers and note identifiers). Localdeduplication pools 225 include the aforementioned subunits of data(data segments), as well as certain metadata associated therewith. Incertain embodiments, local deduplication pools can be maintained inmemory at their respective ones of nodes 130 (in order to provideimproved performance), and subsequently persisted to persistent storage,while local reference databases 220 are maintained on such persistentstorage.

In this regard, an example with further detail of local referencedatabases 220 and local deduplication pools 225 is presented inconnection with node 130(N). As with others of notes 130, node 130(N)includes a local reference database (a local reference database 220(N))and a local deduplication pool (a local deduplication pool 225(N)).Local reference database 220(N) can include, for example, references tothe data segments stored in local deduplication pool 225(N). Localdeduplication pool 225(N), in turn, can include a number of containers(e.g., depicted in FIG. 1 as containers 230(1)-(N), referred to in theaggregate as containers 230), for example. Containers 230 each include ametadata store (e.g., an example of which is depicted in FIG. 1 as ametadata store 240) and a deduplicated data store (e.g., an example ofwhich is depicted in FIG. 1 as a deduplicated data store 250). Examplesof such metadata stores and deduplicated data stores are described infurther detail in connection with FIG. 3 , subsequently. Node 130(N)maintains local reference database 220(N) and local deduplication pool225(N) to maintain the deduplicated data for which node 130(N) isresponsible. In so doing, the architecture of scalable deduplicationsystem 200 implements a share-nothing architecture, in which one or moreof nodes 130 can be replaced by other such nodes (e.g., as by moving agiven node's local reference database and local deduplication pool tothe other node), node failure can be efficiently and effectivelyaddressed, and other such advantages enjoyed.

As will be appreciated in light of the present disclosure, while adeduplication pool such as local deduplication pool 225(N) can maintaindeduplicated data segments directly (e.g., as a storage area in whichdeduplicated data can be stored, and so storing only such deduplicateddata segments, with all metadata residing, for example, in a localreference database such as local reference database 220(N)), localdeduplication pool 225(N) is depicted in FIG. 2 as including a number ofcontainers (depicted in FIG. 2 as containers 230(1)-(N), which arereferred to in the aggregate as containers 230). An example of thecontents of such containers (which are to be understood as logical innature) is container 230(1), which includes metadata store 240 anddeduplicated data store 250. In the implementation of containers 230illustrated in FIG. 2 , the deduplicated data segments stored indeduplicated data store 250 are referred to by information (e.g.,fingerprints or other signature information) stored in metadata store240, along with information about the portions of backup images storedin the corresponding local deduplication pool (e.g., such as thatdescribed previously). In such an implementation, local referencedatabase 220(N) can include information (e.g., fingerprints or othersignature information, container information, data segment offsets,and/or other information) that allows one or more of client systems 110to determine whether a given deduplicated data segment is resident onthe note in question (e.g., node 130(N)), while information in metadatastore 240 allows the client system in question to find the givendeduplicated data segment (e.g., in the case of the client systemperforming a restoration operation) or update a reference thereto (e.g.,in the case of the client system performing a backup operation). Anexample of this latter implementation is also described in furtherdetail in connection with FIG. 3 , subsequently.

In light of the foregoing, it will be appreciated that various metadatamaintained by each of nodes 130 can be stored in that node's localreference database, allowing client systems 110 to determine if portionsof a backup image (e.g., portions of user data 170) are non-duplicativeof portions already stored in the corresponding local deduplicationpool. Once the client system in question determines that a portion ofuser data 170 is not duplicative of the data already stored in theassigned node's local deduplication pool (and thus should be addedthereto), the client system can store that portion of user data 170 andits corresponding identifier (fingerprint or other signature) for theportion in a data object (e.g., a container, such as those discussedsubsequently), and subsequently send the resulting data object to theassigned node. Examples of processes that can support such operationsare described subsequently in connection with FIGS. 6-16 .

On the assigned node, in certain embodiments employing containers, oncea given container is full of unique portions, the entire container canbe written to a location in the local deduplication pool. The containerwritten to the local deduplication pool can also include a localcontainer index, which indicates a local location of each unique portionstored within the container (or other such storage construct). Thisinformation can be maintained, for example, in the assigned node's localreference database. The local container index can contain a signatureassociated with each unique segment stored in the container, oralternatively can contain a shortened version of the signature of eachunique segment stored in the container. An assigned node (e.g., node130(N) in the present example) can store a reference to the containerthat identifies the container (e.g., a container reference such as acontainer identifier) in its local reference database. The signature ofa unique portion can also be associated with the location of the uniqueportion in an entry of its container's metadata store. Thus, anidentification of a portion's location, or a container identifier, canbe found using the signature of the portion as a key in the metadatamaintained at the assigned node. The location of the portion within thecontainer identified by the container identifier can be found in thelocal container index for the container (local reference database) byusing at least a portion of the signature as a key in that index, forexample.

Multiple backup images can be stored by nodes 130. For example, a firstbackup image can be captured from user data 170 and can be stored in thelocal deduplication pools of one or more of nodes 130. A subsequentbackup image captured from user data 170 can contain duplicate portionsthat are identical to portions of the first backup image already storedat nodes 130 and can contain unique portions that are different fromportions of the first backup image (and so, portions that correspond tochanges made to user data 170). The unique portions of the subsequentbackup image can be sent to notes 130, while the duplicate portions arenot sent (since the duplicate portions are identical to instances ofportions already stored by nodes 130). Since only single instances ofportions of a backup image are stored at nodes 130, the local referencedatabase and/or metadata stores at each node can provide a mapping of abackup image to the various non-duplicative portions that compose thebackup image, which are stored in their respective local deduplicationpools. Thus, a single backup image can be associated with multipleportions stored throughout scalable deduplication system 200, andmultiple backup images can be associated with a single portion (e.g.,the multiple backup images share the single portion).

FIG. 3 is a simplified block diagram illustrating an example ofcomponents of a node in a scalable deduplication system, in which dataand metadata stores are depicted in greater detail, according to oneembodiment. Thus, FIG. 3 depicts a scalable deduplication system 300(portion of a scalable deduplication system such as scalablededuplication system 200), and more particularly, a node 310 therein(e.g., in the manner of node 130(N) depicted in FIG. 2 ). In this moredetailed depiction, node 310 can be seen to include a containermanagement module 320, a data interface module 330, and a metadatainterface module 340. In the manner of node 130(N), node 310 alsomaintains a local reference database 350 (in the manner of localreference databases 220 of FIG. 2 ) and a local deduplication pool 360(in the manner of local deduplication pools 225 of FIG. 2 ). Localdeduplication pool 360 includes one or more containers (e.g., an exampleof which is depicted in FIG.3 as container 370).

In turn, container 370 includes a metadata store (in the manner ofmetadata store 240 of FIG. 2 ), which includes a number of signatures(e.g., fingerprints; depicted in FIG. 3 as signatures 382(1)-(N), andreferred to in the aggregate as signatures 32) and associated locations(depicted in FIG. 3 as locations 384(1)-(N), and referred to in theaggregate as locations 384) of their associated data segments, which aredepicted as being stored in a deduplicated data store 390 as segments395(1)-(N) (and referred to in the aggregate as segment 395).

In order to perform data deduplication, a deduplication system needs tobe able to identify redundant copies of data (e.g., files, datasegments, or other units of data). One way that can provide a reasonablelikelihood of finding duplicated instances of data is to divide filedata into segments (e.g., data segments, such as consistently-sizedsegments, although techniques for using variable-sized segments exist),which are analyzed for duplication in the deduplicated data store. Thus,if only a portion of a large file is modified, then only the segment ofdata corresponding to that portion of the file need be stored (e.g., asone of segments 395 deduplicated data store 390) and the remainder ofthe file segments need not be stored. In embodiments such as thosedescribed herein, a backup image file can be divided into a plurality ofchunks, and each chunk can be divided into a plurality of fixed-sizesegments, for example. Thus, a signature can be searched for insignatures 382, and if a match is found, the location of thecorresponding one of segment 395 can be determined using thecorresponding one of locations 384. In certain embodiments, suchsearching is performed by one of client systems 110 by such clientsystem sending a request for the requisite metadata (e.g., informationfrom local reference database 350 and/or metadata from the metadatastores of the relevant ones of the containers in question (e.g.,metadata store 380)), as is discussed subsequently.

That being the case, rather than comparing a segment itself to eachsegment stored in deduplication data store (which can be enormouslytime- and processing-prohibitive), detection of duplicative data isusually performed by comparing smaller data signatures of each datasegment. Client systems 110 can thus use a signature such as signatures382 to determine whether a given segment is already stored indeduplicated data store 390. Each such signature can be a checksum orhash value that is calculated based on data within the data segment. Inmany embodiments, signatures such as fingerprints can be generated in amanner that produces the same identifier for identical items of data(e.g., using a cryptographically strong, collision resistant hashfunction), while also producing different identifiers for non-identicalitems of data. Regardless of which particular technique is used togenerate such signatures, the same signature-generation technique willtypically be implemented by all deduplication performed by clientsystems 110, although techniques exist to allow for the use of multiplesignature-generation techniques. In one example, signature generationcan be performed by deduplication clients (e.g., client software modulesrunning on client systems 110 of FIG. 1 ). Signatures generated byclient software on client systems 110 can be used to search signaturesprovided by nodes such as notes 130. In so doing, such an approachavoids the need to communicate duplicate data segments, and so issignificantly less resource intensive (given that the correspondingsignatures are significantly smaller, in terms of size, compared to thedata segments they represent). That being the case, client systems 110need only communicate the unique data segments (and their associatedsignatures) to the appropriate one(s) of notes 130.

By comparing a newly generated signature of a new segment to signatures382 of segments 395, client systems 110 can determine whether the newsegment should be added to deduplicated data store 390 (e.g., the newsegment is a unique segment). In particular, if a new segment'ssignature does not match any existing signatures maintained by notes130, the client system in question can determine that the new segment isnot already stored within those segments. In response, a client systemcan add the new segment and its signature to a data object that will besent to the assigned node of notes 130, upon completion of the backupoperation, as described subsequently herein. The client system inquestion, one of client systems 110, can use the metadata that clientsystem maintains (e.g., metadata 180) to provide additional informationto the assigned node (e.g., identify each requested segment by itscorresponding signature, backup identifier, time and date, and/or otherinformation). Client systems 110 can transmit the requisite segments,associated fingerprints, and other associated information over network105 via a data stream, for example.

As the requested segments are received, the assigned node can write thesegments into a fixed-size container located in its memory, such as acache. Once the container is full, the entire container can be writtento a location in its local deduplication pool. As noted, this operationcan also be performed with respect to a container (or, depending on theimplementation, the data segments stored therein) stored in a clouddeduplication pool such. The assigned node can also use metadatagenerated thereby, such as locations 384, that indicates the location ofeach segment written to deduplicated data store 390. For example, eachunique segment can be associated with a location (e.g., location 384(1))of the particular segment, such as a container identification (containerID) that contains the unique segment (e.g., segment 395(1)).

FIG. 4 is a simplified block diagram illustrating an example ofcomponents of a scalable deduplication system, in which certaincommunicative couplings and changes therein are depicted, according toone embodiment. A scalable deduplication system 400, in the manner ofother such scalable Deduplication systems, is thus depicted as includinga number of client systems (client systems 110), a number of nodes(nodes 130, which are members of cluster 150), and a cluster managementserver (cluster management server 140). As also depicted earlier,cluster management server 140 has access to a catalog (catalog 160)maintained at node 130(1), and is communicatively coupled to notes 130(and also, though not shown in FIG. 4 , to client systems 110). Scalablededuplication system 400 is presented here to illustrate conceptsrelated to the addition of one or more nodes to cluster 150, and removalof one or more nodes from cluster 150.

In the former case, a node such as node 130(6) might be added to cluster150 in order to provide additional deduplication pool resources tocluster 150. In such a case, cluster management server 140 will managethe inclusion of node 130(6) in cluster 150, and in so doing, providesuch added resources to client systems 110. Ones of client systems 110(e.g., client systems 110(4) and 110(N), in the example presented inFIG. 4 , as indicated by the dashed lines appearing therein) can availthemselves of such additional resources in situations such as theirpresent assigned node becoming full, for purposes of load-balancing, andother such advantageous purposes. The architecture of scalablededuplication system 400 and other such implementations are able to takeadvantage of their shared-nothing architecture in such situations as aresult of the fungibility of the nodes employed therein.

In the latter case, a node such as node 130(6) might be removed fromcluster 150 as the result of, for example, the need for maintenance ofsuch a node's hardware or software. In such a case, cluster managementserver 140 again manages the removal of node 130(6) from cluster 150,which can include moving or copying the node's local reference databaseand local deduplication pool to another of nodes 130, thereby allowingnode 130(6) to be gracefully taken off-line and shutdown. Here again,access to node 130(6) by client systems 110(4) and 110(N) is shifted toones of nodes 130 to which the local reference database and localdeduplication pool in question are moved (again, in the examplepresented in FIG. 4 , as indicated by the dashed lines appearingtherein), where the movement of such information could be, for example,to node 130(4) four client systems 110(4) and 110(N).

FIG. 5 is a simplified block diagram illustrating an example ofcomponents of a scalable deduplication system, in which the relationshipbetween various backups and the nodes on which those backups are storedis depicted, according to one embodiment. Using certain componentsdescribed in connection with FIGS. 1-4 , a scalable deduplication system500 is depicted, in which a number of backups are made (thereby allowinga description of the processes involved). That being the case, clientsystem 130(1) generates a number of backup images (e.g., depicted inFIG. 5 as backup images 510(1)-(N), which are referred to in theaggregate as backup images 510), which are stored variously among nodes130(1)-(N). As will be appreciated in light of the present disclosure,backup images 510 can be representative of full or incremental backupoperations.

As is illustrated, backup images can be stored entirely on a singlenode, split between nodes (e.g., in the case in which the storage of afirst assigned node becomes full), or stored with other backup images ata given node. In the situation depicted in FIG. 5 , node 130(1) storesbackup image 510(1) in its entirety, and a portion of a backup image510(2). The remainder of backup image 510(2) is stored at node 130(2),along with backup images 510(3) and 510(4), in their entirety. Bycontrast, backup image 510(5) is stored alone and in its entirety atnode 130(3), as is the case for backup image 510(N) being stored at node130(N).

Example Processes for Scalable Deduplication Systems

FIG. 6 is a flow diagram illustrating an example of a scalablededuplicated backup process implemented in a scalable deduplicationsystem, according to one embodiment. FIG. 6 thus depicts a scalable beduplicated backup process 600, which begins with the assignment of anassigned node to the backup operation to be performed (605). An exampleof a process for assigning an assigned node to a backup operation isprovided in connection with FIG. 7 and its description, subsequently.

Next, one or more fingerprint lists are retrieved (610). For example, afingerprint list corresponding to the deduplicated data segments of thelast full backup operation may be retrieved. Additionally, one or morefingerprint lists corresponding to the deduplicated data segments of oneor more incremental backup operations performed subsequent to the lastfull backup operation may also be retrieved. An example of a process forretrieving such fingerprint lists is described in connection with FIG. 8, subsequently.

Having retrieved the requisite fingerprint list(s), scalablededuplicated backup process 600 proceeds with the selection of afingerprint for the first (or next) data segment that is to be searchedfor (615). Scalable deduplicated backup process 600 then searches thefingerprint list(s) for the selected fingerprint (620). A determinationis then made as to whether the selected fingerprint was found in thefingerprint list(s) in question (625).

In the case in which the selected fingerprint is found in one of thefingerprint lists, a process of updating a reference to the(deduplicated) data segment, which is already stored on the assignednode or one of the other (remote) nodes, is performed (630). An exampleof a process for updating such references is provided in connection withFIGS. 9 and 10 , and their associated descriptions, subsequently.Further, it is to be appreciated that, while nodes other than theassigned node are among the nodes of the cluster in question (e.g.,nodes 130 of cluster 150), such nodes are referred to herein as remotenodes, in order to distinguish such remote nodes from the assigned node(the node at which any data segments resulting from the current backupoperation, as well as any associated metadata (e.g., fingerprints) arestored).

A determination is then made as to whether the reference updateoperation was successful (635). If the reference update operation wassuccessful, a determination is then made as to whether furtherfingerprints remain to be selected and searched for (640). If furtherfingerprints remain to be searched, scalable deduplicated backup process600 loops to the selection of the next fingerprint to be searched for(615). Otherwise, if the fingerprints for the backup operation have beensearched, one or more data objects (though typically, one data object)are sent to the assigned node (645). Such a data object can be, forexample, a container such as one of containers 230 in FIG. 2 . Anexample of a process for sending a data object to an assigned node isprovided in connection with FIG. 13 and its associated description,subsequently. Scalable deduplicated backup process 600 then concludes.

Alternatively, in the case in which the reference update operation wasnot successful (635), an indication is provided (e.g., such as tocluster management server 140), indicating that the reference updateoperation has failed (650). Scalable deduplicated backup process 600then concludes.

Returning to the determination as to whether the selected fingerprintwas found in the fingerprint lists in question (625), if the selectedfingerprint was not found in those fingerprint lists, a fingerprintsearch process is performed on the assigned node (660). An example of aprocess for performing a fingerprint search process on the assigned nodeis described in greater detail in connection with FIG. 11 ,subsequently.

A determination is then made as to whether the selected fingerprint wasfound as a result of the search performed on the assigned node (670). Ifthe selected fingerprint was found on the assigned node, the referenceupdate process noted earlier is performed (630). As before, adetermination as to the success of this reference update operation ismade (635). If the reference update operation was successful, adetermination is made as to whether additional fingerprints remain to besearched (640), and either scalable deduplicated backup process 600loops to the selection of the next fingerprint (615) or any data objectproduced by scalable deduplicated backup process 600 is sent to theassigned node (645) (with scalable deduplicated backup process 600 thenconcluding).

Alternatively, if the selected fingerprint is not found on the assignednode (670), scalable deduplicated backup process 600 proceeds with theinclusion of the selected fingerprint and the data object associatedthere with in the data object noted earlier (680). As will beappreciated in light of the present disclosure, such an instancereflects that the data segment in question is unique (its fingerprinthaving not been found), and so no copy thereof being stored on thecluster's nodes. An example of a process for including fingerprints andtheir associated data segments in the data object is described inconnection with FIG. 12 , subsequently.

A determination is then made as to whether additional fingerprintsremain to be searched (685). If further fingerprints remain to besearched, scalable deduplicated backup process 600 loops to theselection of the next fingerprint to be searched for (615). Otherwise,if the fingerprints for the backup operation have been searched, one ormore data objects (though typically, one data object) that have beencreated are sent to the assigned node (645) (by way of connector “A”).As noted, an example of a process for sending a data object to anassigned node is provided in connection with FIG. 13 and its associateddescription, subsequently. Scalable deduplicated backup process 600 thenconcludes.

FIG. 7 is a flow diagram illustrating an example of a assigned nodeselection process implemented in a scalable deduplication system,according to one embodiment. An assigned node selection process 700begins with the identification of one or more available nodes (e.g., oneor more of notes 130 available for selection as an assigned node) (710).A determination is then made as to one or more metrics (node metrics)available for use in selecting the assigned node (720). Such metrics caninclude gathering information with regard to the workload of each nodeavailable for selection (and so selecting the node with the lowestworkload), selecting the node with the largest amount of free space(which could be based, for example, on how recently the node was addedto the cluster), data affinity, the capabilities of each node (e.g.,computational performance, network bandwidth supported, proximity(logical and/or physical), age of backup information stored (increasingthe likelihood of duplicates), or the like), or other such factors. Theavailable node(s) and one or more node metrics having been determined,and assigned node is selected from the available nodes, using the one ormore node metrics (730). Assigned node selection process 700 thenconcludes.

FIG. 8 is a flow diagram illustrating an example of a fingerprint listretrieval process implemented in a scalable deduplication system,according to one embodiment. A fingerprint list retrieval process 800 isthus depicted, and begins with the identification of a last full backupof the data in question (backup image) by the scalable backupdeduplication system (810). A determination is then made as to whetherany incremental backups (backup images) have been made subsequent to thelast full backup (820). Such might be the case, for example, in anarchitecture in which changes made to the data are tracked subsequent tosuch full backup. If one or more incremental backups exist, thoseincremental backups are identified (830). Having identified the lastfull backup and any incremental backups, fingerprint list retrievalprocess 800 proceeds to sending a request for the requisite fingerprintlist(s) to the catalog node (840), the catalog node being the one ormore nodes of notes 130 at which the catalog for the last full backup inany incremental backups is stored. Information as to such catalog nodescan be provided, for example, by a cluster management server such ascluster management server 140.

In return (e.g., as from the aforementioned cluster management server),the locations of the fingerprint lists for the last full backup and anyincremental backups are received by the client system (850). Next, usingthis information, the client system can retrieve the requisitefingerprint lists for the last full backup and any incremental backupsfrom the locations identified therein (860). Fingerprint list retrievalprocess 800 then concludes.

FIG. 9 is a flow diagram illustrating an example of a reference requestprocess implemented in a scalable deduplication system, according to oneembodiment. A reference request process 900 is thus depicted that beginswith a determination of the location of the data segment in question(910). Once the location of the data segment in question has beendetermined, this location information is included in a reference updatemessage (920). Also included in the reference update message isinformation identifying the backup in question (925). The referenceupdate message is then sent to the assigned node (930). As will beappreciated light of the present disclosure, the reference updatemessage can, in the alternative, be sent directly to a node storing thedata segment in question, in certain embodiments. Further in thisregard, such embodiments can then send the requisite information (e.g.,the location of the data segment and information identifying the backquestion, as well as information identifying the node storing the datasegment) to the assigned node. In any event, a determination is made asto whether an update result messages been received (940). In so doing,reference request process 900 iterates until the receipt of the updateresult message.

Having received the update result message, a determination is made as towhether the reference update operation was successful (950 close friend.If the update result message indicates that the reference update wasunsuccessful, and indication to this effect is made (960) and referencerequest process 900 concludes. Alternatively, if the update resultmessage indicates that the reference update was successful, andindication to this effect is made (970) and reference request process900 concludes.

FIG. 10 is a flow diagram illustrating an example of a reference updateprocess implemented in a scalable deduplication system, according to oneembodiment. FIG. 10 thus depicts a reference update process 1000, as canbe executed by an assigned node, for example. Reference update process1000 begins with the receipt of a reference update request (1010), asmight be received, for example, from a client system such as one ofclient systems 110. Once the note in question is in receipt of thereference update request, information regarding location of the datasegment in question is retrieved from the reference update message(1020). Also retrieved from the reference update message is informationidentifying the backup for which the reference is to be updated (1030).At the structure, a reference is added to the metadata for the datasegment in question (1040). Such an operation can, for example, resultin an update to a local reference database on a given node (e.g., localreference databases to 20 of nodes 130) and/or reference information ina metadata store (e.g., metadata store 240).

An attempt to add the requisite reference having been made, adetermination is made as to whether the reference was successfully added(1050). If the reference was successfully added to the metadata inquestion, a reference update success message is sent by the assignednode, for example, to the client system in question (1060), andreference update process 1000 then concludes. In the alternative, if theaddition of the reference was not successful, a reference update failuremessage is sent by the assigned node to the client system in question(1070), and, as before, reference update process 1000 concludes.

FIG. 11 is a flow diagram illustrating an example of a fingerprintsearch process implemented in a scalable deduplication system, accordingto one embodiment. FIG. 11 thus depicts a fingerprint search process1100, as can be carried out by an assigned node in performing afingerprint search requested by a client system. Fingerprint searchprocess 1100 begins with receipt of a fingerprint search request fromsuch a client system (1110). The assigned node then searches the localreference database for the fingerprint identified in the fingerprintsearch request (1120). A determination is then made as to whether thefingerprint in question has been found (1130). If the fingerprint inquestion is found at the assigned node, a fingerprint search resultmessage indicating that the fingerprint has been found (and, optionally,including information regarding that fingerprint (e.g., its locationand/or other identifying information)) is sent to the requesting clientsystem (1140). Fingerprint search process 1100 then concludes.Alternatively, if the fingerprint in question was not found, afingerprint search result message is sent indicating that thefingerprint was not found (1150), and figure print search process 1100once again concludes.

FIG. 12 is a flow diagram illustrating an example of a data object saveprocess implemented in a scalable deduplication system, according to oneembodiment. FIG. 12 thus depicts a data object save process 1200, inwhich a client system includes a data segment, its associatedfingerprint, and other related information, such as informationidentifying the backup being performed. Data object save process 1200thus begins with saving the fingerprint in question in the data object(1210). As noted, the data segment in question is also saved in the dataobject (1220). A determination is then made as to whether thefingerprint, data segment, and, optionally, other related information,was safe successfully in the data object (1230). If the fingerprint,data segment, and, optionally, other related information wassuccessfully saved in the data object, an indication to this effect ismade (1240), and data object save process 1200 concludes. Alternatively,if a failure occurred in the saving of the fingerprint, the datasegment, and/or other related information, and indication of thisfailure is provided (1250), and, as before, data object save process1200 concludes.

FIG. 13 is a flow diagram illustrating an example of a data object sendprocess implemented in a scalable deduplication system, according to oneembodiment. FIG. 13 thus depicts a data object send process 1300, as canbe carried out by a client system sending a data object according tocertain embodiments, for example. Data object send process 1300 beginswith the closing of the data object by the client system (1310). Thedata object, having been closed, is then sent to the assigned node(1320). Data object send process 1300 then awaits the receipt ofnotification from the assigned node, indicating that the storageoperation has completed (1330). A determination is then made as towhether the storage operation was successful (1340). If the storageoperation was successful, and indication as to the successful storage ofthe fingerprint, the data segment, and any other related information isprovided (1350), data object send process 1300 then concludes.Alternatively, if one or more of the fingerprint, the data segment,and/or any related information is not successful, the failure in thestorage process is indicated (1360), and data object send process 1300once again concludes.

FIG. 14 is a flow diagram illustrating an example of a assigned nodedata object storage process implemented in a scalable deduplicationsystem, according to one embodiment. FIG. 14 thus depicts an assignednode data object storage process 1400, which begins with the receipt ofa data object at the assigned node, having been sent from the clientsystem in question (1405). The assigned node, having received the dataobject, locks one or more local resources in order to update thoseresources with the information in the data object received (1410).Information in the data object (e.g., information regarding thefingerprint and associated data segment, as well as other informationsuch as information identifying the backup operation completed) is thenstored at the assigned node (1415). At this juncture, one or more localreferences can be added to the metadata maintained at the assigned node(e.g., in the local reference database and/or in one or more metadatastores) (1420). The aforementioned information having been stored, theone or more local resources can then be unlocked by the assigned node(1425).

A determination is then made as to whether the data object wassuccessfully stored (1430). In the case in which storage of the dataobject was unsuccessful, an indication to this effect is provided by theassigned node to the client system (1440). Alternatively, if the dataobject was stored successfully, the store data object is then analyzedby the assigned node (1450). Assigned node data object storage process1400, having analyzed the store data object, makes a determination as towhether the stored data object includes one or more remote references(1460). In the case in which the store data object does not include anyremote references (as used herein, indicating a reference to a datasegment maintain on a node other than the assigned node), an indicationcan be made to the client system indicating that the data objectreceived from the client system was stored successfully (1470). As willbe appreciated in light of the present disclosure, such is the casebecause no further operations are needed to store the data object, thedata object having been stored successfully (and having no remotereferences). Assigned node data object storage process 1400 thenconcludes.

Alternatively, in the case in which the data object includes one or moreremote references (1460), a process of generating one or morecorresponding sub-data objects, and either storing a given one of thesub-data objects at the assigned node or sending that information to theappropriate remote node(s) is performed. To this end, a sub-data objectis selected from among one or more sub-data objects included in the dataobject being stored (1475). Sub-data object processing is then performedon the selected sub-data object (1480). An example of a process forgenerating and sending such sub-data objects is provided in connectionwith FIG. 15 and its description, subsequently. A determination is thenmade as to whether the sub-data object in question was savedsuccessfully (1485). In the case in which the sub-data object was notsuccessfully stored, an indication to this effect is sent to the clientsystem (1440). As can be seen in FIG. 14 , such indication can be madewith respect to the particular sub-data object in question, or moresimply by an indication that, as a whole, the data object was not storedsuccessfully.

Alternatively, if the sub-data object in question was successfullystored, a determination can be made as to whether additional sub-dataobjects remain to be processed (1490). If further sub-data objectsremain be processed, assigned node data object storage process 1400proceeds to the selection of the next sub-data object (1495), anditerates performing the aforementioned sub-data object processing on theselected sub-data object (1480). Assigned node data object storageprocess 1400 then continues as noted above.

If further sub-data objects do not remain to be processed, assigned nodedata object storage process 1400 proceeds to making an indication thatthe data object in question was successfully stored (1470). Assignednode data object storage process 1400 then concludes.

FIGS. 15A and 15B illustrate a flow diagram depicting an example of aassigned node sub-data object storage process implemented in a scalablededuplication system, according to one embodiment. FIGS. 15A and 15Bthus depict an assigned node sub-data object storage process 1500, whichbegins with the identification of the data object in question (1510).Next, sub-data object information for the given sub-data object isdetermined (1515). The sub-data object itself is then generated (1520).Next, the node that is to maintain the sub-data object is identified(1525). A determination is then made as to whether the node thusidentified is the assigned node (1530). If the identified node is theassigned node, the sub-data object is stored locally (at the assignednode), in the manner noted with regard to the storage of the data object(that being the lock, add, unlock sequence noted earlier) (1540). It isto be understood while such an outcome is possible, it will typicallynot be the case, as such data would simply be stored as part of the dataobject in question as part of the backup operation (though situationsmay exist, such as if the sub-data object were the result of a separatebackup operation). Alternatively, if the identified node is not theassigned node (and so is another node, such as one of nodes 130, andreferred to herein as a “remote node” (indicating that such node is notthe assigned node)), the assigned node sends the sub-data object to theremote node (1550). In such situations, the assigned node can send a“reference request” to such remote nodes, where data maintained at suchnodes is referenced by the given backup. The assigned node then awaitsreceipt of a status message from the remote node (1555).

At this juncture, whether stored locally or by a remote node, assignednode sub-data object storage process 1500 proceeds, via connector “A”, adetermination is made as to whether the sub-data object was storedsuccessfully (1560). In the case in which some manner of failureoccurred in the storage of the sub-data object, assigned node sub-dataobject storage process 1500 proceeds with making an indication that thestorage of the sub-data object was unsuccessful (1565), and assignednode sub-data object storage process 1500 concludes.

Alternatively, if the sub-data object in question was successfullystored, a determination as to whether more sub-data objects remain to beprocessed is made (1570). If further sub-data objects remain to beprocessed, assigned node sub-data object storage process 1500 proceeds,via connector “B”, to making a determination as to the sub-data objectinformation for the next sub-data object (1515), and assigned nodesub-data object storage process 1500 continues. In the alternative, ifno further sub-data objects remain to be processed, an indication ismade to the effect that the storage of the sub-data object(s) wassuccessful (1580), and assigned node sub-data object storage process1500 concludes.

FIG. 16 is a flow diagram illustrating an example of a remote nodesub-data object storage process implemented in a scalable deduplicationsystem, according to one embodiment. FIG. 16 thus depicts a remote nodesub-data object storage process 1600, which begins with the receipt of asub-data object from the assigned node (1610). In a manner comparable tothat noted earlier, the remote node locks the requisite resources inpreparation for adding a reference to its data (e.g., a correspondingone of the data segments maintained by the remote node) (1620). Theremote node then adds the aforementioned reference (1630), and thenunlocks the affected resource(s) (1640). A determination is then made asto whether the sub-data object was processed successfully (1650). Ifsuch processing was successful, the remote node sends a message to theassigned node indicating that processing of the sub-data object by theremote node was successful (1660), and remote node sub-data objectstorage process 1600 concludes. Alternatively, if processing of thesub-data object was not successful, the remote node sends a message tothe assigned node indicating that such processing was unsuccessful(1670), and remote node sub-data object storage process 1600 concludes.

FIG. 17 is a simplified block diagram illustrating an example ofcomponents of a scalable backup deduplication system and its operation,according to one embodiment. FIG. 17 thus illustrates a scalablededuplication system 1700 that includes certain components describedpreviously in connection with FIGS. 1-5 . Scalable deduplication system1700 once again includes a client system (client system 110(1), or moresimply, the client system), nodes (nodes 130(1)-(2), or more simply thefirst node and the second node), and a cluster management server(cluster management server 140), communicatively coupled in theaforementioned manner. Also as before, a catalog (catalog 160) ismaintained at node 130(1), and nodes 130(1)-(2) maintain theirrespective local reference databases (local reference databases220(1)-(2), or more simply the first and second local referencedatabases) and local deduplication pools (local deduplication pools225(1)-(2), or more simply the first and second local deduplicationpools).

The operations illustrated as occurring in scalable deduplication system1700 are as follows. The client system first determines the identity ofthe assigned node by referencing catalog 160 (1705), which has beenassigned by cluster management server 140 (1710). Having determined theassigned node (in this case, the first node), the client accesses thecatalog to determine the location of the previous full backup image andany incremental backup images (1715). The client then reads thesignature lists for the full backup and any incremental backups from theassigned node (1720). At this juncture, the client performs a lookupoperations (i.e., searches) on the signature lists thus obtained (1725).If the client finds the given signature in the signature lists, areference can be added to the local reference database in question,subsequently.

Alternatively, if the signature in question is not found in thesignature lists, the client performs a lookup operation on the assignednode's location (1730). However, if the assigned node is full, forexample, a lookup operation can be performed in the assigned forwardnode location pool, at the second node (1735). If the signature inquestion is not found at the first node or the second node, the datasegment can be written to the assigned node (1740) and stored in itslocal deduplication pool (the first local deduplication pool), or, ifthe first node is full, written to the assigned forward node (the secondnode) and stored in its local deduplication pool (the second localdeduplication pool (1745). At this juncture, the client system can sendthe signature as a reference to the data segment to the assigned node(the first node) for storage in its local reference database (the firstlocal reference database) (1750), or, if the first node is full, to theassigned forward node (the second node) and stored in its localreference database (the second local reference database) (1755).

In performing a process such as that just described, the client systemsof scalable deduplication system 1700 are able to provide data segmentsand their references to nodes 130 for unique data segments, and addreferences for non-unique data segments. In so doing, the process ofstoring such deduplicated backup images also result in the updating ofcatalog 160, thereby facilitating the storage of further deduplicatedbackup images.

As can be seen in FIG. 17 , a given backup image remains tied to onebackup source (e.g., the client system), and node information can berecorded on a per-backup image (and so, per-backup source) basis.Further, the storage of backup images can be assigned (i.e., theassigned node for a given backup image assigned) according to dataaffinity, allowing for the more efficient use of computing andnetworking resources. In the case in which a given assigned node becomesfull (or becomes overloaded with respect to computing, storage, and/ornetworking resources), a new node can be chosen based on its resourceusage (while being tracked under the related node list for the backupsource). Additionally, some or all of the existing data/metadata for agiven data source can be moved to the new assigned node. Further stillthe latest node for a given backup source can be selected, in order tomaintain locality of that backup source's data. This can be achieved, incertain embodiments, by selecting the latest node in the backup sourcenode list as the assigned node.

An Example Computing and Network Environment

As noted, the systems described herein can be implemented using avariety of computer systems and networks. The following illustrates anexample configuration of a computing device such as those describedherein. The computing device may include one or more processors, arandom access memory (RAM), communication interfaces, a display device,other input/output (I/O) devices (e.g., keyboard, trackball, and thelike), and one or more mass storage devices (e.g., optical drive (e.g.,CD, DVD, or Blu-ray), disk drive, solid state disk drive, non-volatilememory express (NVME) drive, or the like), configured to communicatewith each other, such as via one or more system buses or other suitableconnections. While a single system bus 514 is illustrated for ease ofunderstanding, it should be understood that the system buses 514 mayinclude multiple buses, such as a memory device bus, a storage devicebus (e.g., serial ATA (SATA) and the like), data buses (e.g., universalserial bus (USB) and the like), video signal buses (e.g., ThunderBolt®,DVI, HDMI, and the like), power buses, or the like.

Such CPUs are hardware devices that may include a single processing unitor a number of processing units, all of which may include single ormultiple computing units or multiple cores. Such a CPU may include agraphics processing unit (GPU) that is integrated into the CPU or theGPU may be a separate processor device. The CPU may be implemented asone or more microprocessors, microcomputers, microcontrollers, digitalsignal processors, central processing units, graphics processing units,state machines, logic circuitries, and/or any devices that manipulatesignals based on operational instructions. Among other capabilities, theCPU may be configured to fetch and execute computer-readableinstructions stored in a memory, mass storage device, or othercomputer-readable storage media.

Memory and mass storage devices are examples of computer storage media(e.g., memory storage devices) for storing instructions that can beexecuted by the processors 502 to perform the various functionsdescribed herein. For example, memory can include both volatile memoryand non-volatile memory (e.g., RAM, ROM, or the like) devices. Further,mass storage devices may include hard disk drives, solid-state drives,removable media, including external and removable drives, memory cards,flash memory, floppy disks, optical disks (e.g., CD, DVD, Blu-ray), astorage array, a network attached storage, a storage area network, orthe like. Both memory and mass storage devices may be collectivelyreferred to as memory or computer storage media herein and may be anytype of non-transitory media capable of storing computer-readable,processor-executable program instructions as computer program code thatcan be executed by the processors as a particular machine configured forcarrying out the operations and functions described in theimplementations herein.

The computing device may include one or more communication interfacesfor exchanging data via a network. The communication interfaces canfacilitate communications within a wide variety of networks and protocoltypes, including wired networks (e.g., Ethernet, DOCSIS, DSL, Fiber,USB, etc.) and wireless networks (e.g., WLAN, GSM, CDMA, 802.11,Bluetooth, Wireless USB, ZigBee, cellular, satellite, etc.), theInternet and the like. Communication interfaces can also providecommunication with external storage, such as a storage array, networkattached storage, storage area network, cloud storage, or the like.

The display device may be used for displaying content (e.g., informationand images) to users. Other I/O devices may be devices that receivevarious inputs from a user and provide various outputs to the user, andmay include a keyboard, a touchpad, a mouse, a printer, audioinput/output devices, and so forth. The computer storage media, such asmemory 504 and mass storage devices, may be used to store software anddata, such as, for example, an operating system, one or more drivers(e.g., including a video driver for a display such as display 180), oneor more applications, and data. Examples of such computing and networkenvironments are described below with reference to FIGS. 18 and 19 .

FIG. 18 depicts a block diagram of a computer system 1810 suitable forimplementing aspects of the systems described herein. Computer system1810 includes a bus 1812 which interconnects major subsystems ofcomputer system 1810, such as a central processor 1814, a system memory1817 (typically RAM, but which may also include ROM, flash RAM, or thelike), an input/output controller 1818, an external audio device, suchas a speaker system 1820 via an audio output interface 1822, an externaldevice, such as a display screen 1824 via display adapter 1826, serialports 1828 and 1830, a keyboard 1832 (interfaced with a keyboardcontroller 1833), a storage interface 1834, a USB controller 1837operative to receive a USB drive 1838, a host bus adapter (HBA)interface card 1835A operative to connect with a optical network 1890, ahost bus adapter (HBA) interface card 1835B operative to connect to aSCSI bus 1839, and an optical disk drive 1840 operative to receive anoptical disk 1842. Also included are a mouse 1846 (or otherpoint-and-click device, coupled to bus 1812 via serial port 1828), amodem 1847 (coupled to bus 1812 via serial port 1830), and a networkinterface 1848 (coupled directly to bus 1812).

Bus 1812 allows data communication between central processor 1814 andsystem memory 1817, which may include read-only memory (ROM) or flashmemory (neither shown), and random access memory (RAM) (not shown), aspreviously noted. RAM is generally the main memory into which theoperating system and application programs are loaded. The ROM or flashmemory can contain, among other code, the Basic Input-Output System(BIOS) which controls basic hardware operation such as the interactionwith peripheral components. Applications resident with computer system1810 are generally stored on and accessed from a computer-readablestorage medium, such as a hard disk drive (e.g., fixed disk 1844), anoptical drive (e.g., optical drive 1840), a universal serial bus (USB)controller 1837, or other computer-readable storage medium.

Storage interface 1834, as with the other storage interfaces of computersystem 1810, can connect to a standard computer-readable medium forstorage and/or retrieval of information, such as a fixed disk drive1844. Fixed disk drive 1844 may be a part of computer system 1810 or maybe separate and accessed through other interface systems. Modem 1847 mayprovide a direct connection to a remote server via a telephone link orto the Internet via an internet service provider (ISP). Networkinterface 1848 may provide a direct connection to a remote server via adirect network link to the Internet via a POP (point of presence).Network interface 1848 may provide such connection using wirelesstechniques, including digital cellular telephone connection, CellularDigital Packet Data (CDPD) connection, digital satellite data connectionor the like.

Many other devices or subsystems (not shown) may be connected in asimilar manner (e.g., document scanners, digital cameras and so on).Conversely, all of the devices shown in FIG. 18 need not be present topractice the systems described herein. The devices and subsystems can beinterconnected in different ways from that shown in FIG. 18 . Theoperation of a computer system such as that shown in FIG. 18 will bereadily understood in light of the present disclosure. Code to implementportions of the systems described herein can be stored incomputer-readable storage media such as one or more of system memory1817, fixed disk 1844, optical disk 1842, or USB drive 1838. Theoperating system provided on computer system 1810 may be WINDOWS, UNIX,LINUX, IOS, or other operating system.

Moreover, regarding the signals described herein, those skilled in theart will recognize that a signal can be directly transmitted from afirst block to a second block, or a signal can be modified (e.g.,amplified, attenuated, delayed, latched, buffered, inverted, filtered,or otherwise modified) between the blocks. Although the signals of theabove described embodiment are characterized as transmitted from oneblock to the next, other embodiments may include modified signals inplace of such directly transmitted signals as long as the informationaland/or functional aspect of the signal is transmitted between blocks. Tosome extent, a signal input at a second block can be conceptualized as asecond signal derived from a first signal output from a first block dueto physical limitations of the circuitry involved (e.g., there willinevitably be some attenuation and delay). Therefore, as used herein, asecond signal derived from a first signal includes the first signal orany modifications to the first signal, whether due to circuitlimitations or due to passage through other circuit elements which donot change the informational and/or final functional aspect of the firstsignal.

FIG. 19 is a block diagram depicting a network architecture 1900 inwhich client systems 1910, 1920 and 1930, as well as storage servers1940A and 1940B (any of which can be implemented using computer system1910), are coupled to a network 1950. Storage server 1940A is furtherdepicted as having storage devices 1960A(1)-(N) directly attached, andstorage server 1940B is depicted with storage devices 1960B(1)-(N)directly attached. Storage servers 1940A and 1940B are also connected toa SAN fabric 1970, although connection to a storage area network is notrequired for operation. SAN fabric 1970 supports access to storagedevices 1980(1)-(N) by storage servers 1940A and 1940B, and so by clientsystems 1910, 1920 and 1930 via network 1950. An intelligent storagearray 1990 is also shown as an example of a specific storage deviceaccessible via SAN fabric 1970.

With reference to computer system 1810, modem 1847, network interface1848 or some other method can be used to provide connectivity from eachof client computer systems 1910, 1920 and 1930 to network 1950. Clientsystems 1910, 1920 and 1930 are able to access information on storageserver 1940A or 1940B using, for example, a web browser or other clientsoftware (not shown). Such a client allows client systems 1910, 1920 and1930 to access data hosted by storage server 1940A or 1940B or one ofstorage devices 1960A(1)-(N), 1960B(1)-(N), 1980(1)-(N) or intelligentstorage array 1990. FIG. 19 depicts the use of a network such as theInternet for exchanging data, but the systems described herein are notlimited to the Internet or any particular network-based environment.

Other Embodiments

The example systems and computing devices described herein are welladapted to attain the advantages mentioned as well as others inherenttherein. While such systems have been depicted, described, and aredefined by reference to particular descriptions, such references do notimply a limitation on the claims, and no such limitation is to beinferred. The systems described herein are capable of considerablemodification, alteration, and equivalents in form and function, as willoccur to those ordinarily skilled in the pertinent arts in consideringthe present disclosure. The depicted and described embodiments areexamples only, and are in no way exhaustive of the scope of the claims.

Such example systems and computing devices are merely examples suitablefor some implementations and are not intended to suggest any limitationas to the scope of use or functionality of the environments,architectures and frameworks that can implement the processes,components and features described herein. Thus, implementations hereinare operational with numerous environments or architectures, and may beimplemented in general purpose and special-purpose computing systems, orother devices having processing capability. Generally, any of thefunctions described with reference to the figures can be implementedusing software, hardware (e.g., fixed logic circuitry) or a combinationof these implementations. The term “module,” “mechanism” or “component”as used herein generally represents software, hardware, or a combinationof software and hardware that can be configured to implement prescribedfunctions. For instance, in the case of a software implementation, theterm “module,” “mechanism” or “component” can represent program code(and/or declarative-type instructions) that performs specified tasks oroperations when executed on a processing device or devices (e.g., CPUsor processors). The program code can be stored in one or morecomputer-readable memory devices or other computer storage devices.Thus, the processes, components and modules described herein may beimplemented by a computer program product.

The foregoing thus describes embodiments including components containedwithin other components (e.g., the various elements shown as componentsof computer system 1210). Such architectures are merely examples, and,in fact, many other architectures can be implemented which achieve thesame functionality. In an abstract but still definite sense, anyarrangement of components to achieve the same functionality iseffectively “associated” such that the desired functionality isachieved. Hence, any two components herein combined to achieve aparticular functionality can be seen as “associated with” each othersuch that the desired functionality is achieved, irrespective ofarchitectures or intermediate components. Likewise, any two componentsso associated can also be viewed as being “operably connected,” or“operably coupled,” to each other to achieve the desired functionality.

Furthermore, this disclosure provides various example implementations,as described and as illustrated in the drawings. However, thisdisclosure is not limited to the implementations described andillustrated herein, but can extend to other implementations, as would beknown or as would become known to those skilled in the art. Reference inthe specification to “one implementation,” “this implementation,” “theseimplementations” or “some implementations” means that a particularfeature, structure, or characteristic described is included in at leastone implementation, and the appearances of these phrases in variousplaces in the specification are not necessarily all referring to thesame implementation. As such, the various embodiments of the systemsdescribed herein via the use of block diagrams, flowcharts, andexamples. It will be understood by those within the art that each blockdiagram component, flowchart step, operation and/or componentillustrated by the use of examples can be implemented (individuallyand/or collectively) by a wide range of hardware, software, firmware, orany combination thereof.

The systems described herein have been described in the context of fullyfunctional computer systems; however, those skilled in the art willappreciate that the systems described herein are capable of beingdistributed as a program product in a variety of forms, and that thesystems described herein apply equally regardless of the particular typeof computer-readable media used to actually carry out the distribution.Examples of computer-readable media include computer-readable storagemedia, as well as media storage and distribution systems developed inthe future.

The above-discussed embodiments can be implemented by software modulesthat perform one or more tasks associated with the embodiments. Thesoftware modules discussed herein may include script, batch, or otherexecutable files. The software modules may be stored on amachine-readable or computer-readable storage media such as magneticfloppy disks, hard disks, semiconductor memory (e.g., RAM, ROM, andflash-type media), optical discs (e.g., CD-ROMs, CD-Rs, and DVDs), orother types of memory modules. A storage device used for storingfirmware or hardware modules in accordance with an embodiment can alsoinclude a semiconductor-based memory, which may be permanently,removably or remotely coupled to a microprocessor/memory system. Thus,the modules can be stored within a computer system memory to configurethe computer system to perform the functions of the module. Other newand various types of computer-readable storage media may be used tostore the modules discussed herein.

In light of the foregoing, it will be appreciated that the foregoingdescriptions are intended to be illustrative and should not be taken tobe limiting. As will be appreciated in light of the present disclosure,other embodiments are possible. Those skilled in the art will readilyimplement the steps necessary to provide the structures and the methodsdisclosed herein, and will understand that the process parameters andsequence of steps are given by way of example only and can be varied toachieve the desired structure as well as modifications that are withinthe scope of the claims. Variations and modifications of the embodimentsdisclosed herein can be made based on the description set forth herein,without departing from the scope of the claims, giving full cognizanceto equivalents thereto in all respects.

Although the present invention has been described in connection withseveral embodiments, the invention is not intended to be limited to thespecific forms set forth herein. On the contrary, it is intended tocover such alternatives, modifications, and equivalents as can bereasonably included within the scope of the invention as defined by theappended claims.

What is claimed is:
 1. A method comprising: receiving a data object froma client system at an assigned node, wherein the assigned node is a nodeof a plurality of nodes of a cluster, the data object is being backed upas part of a backup operation for the client system, the assigned nodeis assigned to the backup operation and stores a catalog for use in thebackup operation, the data object comprises a data segment, and asignature, and the signature is generated based, at least in part, ondata of the data segment; determining whether the data object comprisesa sub-data object, wherein the determining uses the catalog; and inresponse to a determination that the data object comprises the sub-dataobject, processing the data object, wherein the backup operationcomprises the determining and the processing, the assigned node performsthe determining and the processing the data object, the data segment isstored in a first local deduplication pool at the assigned node, thesignature is stored in a first local metadata store at the assignednode, and the processing the data object comprises determining a remotenode at which the sub-data object is to be stored, generating areference that identifies the sub-data object and the remote node,storing the reference as a stored reference in a catalog at the assignednode, wherein storage of the stored reference in the catalog facilitatesaccess to the sub-data object at the remote node, and sending thesub-data object to the remote node, wherein the sending the sub-dataobject facilitates storage of  a data segment of the sub-data object ina second local deduplication pool at the remote node, and  a signatureof the sub-data object in a second local metadata store at the remotenode, and the remote node is another node of the plurality of nodes,other than the assigned node.
 2. The method of claim 1, wherein the dataobject comprises a container, the container comprises a containerdeduplicated data store and a container metadata store, the containerdeduplicated data store comprises one or more data segments comprisingthe data segment, and the container metadata store comprises metadataassociated with the one or more data segments.
 3. The method of claim 2,wherein the metadata comprises the signature of the data segment and alocation in the container deduplicated data store at which the datasegment is stored.
 4. The method of claim 3, wherein the signature is afingerprint, and the fingerprint was generated by performing a hashfunction on the data of the data segment.
 5. The method of claim 1,wherein the data object comprises a container, and the sending thesub-data object to the remote node comprises: sending the container tothe remote node; and sending a container reference to the remote node,wherein the container reference comprises a container identifier thatidentifies the container.
 6. The method of claim 5, further comprising:receiving the sub-data object at the remote node; storing the containerin a local deduplication pool at the remote node; and storing thecontainer reference in a local reference database at the remote node. 7.The method of claim 6, further comprising: receiving a request for afingerprint list from a client system; retrieving the fingerprint listfrom a catalog; and sending the fingerprint list to the client system.8. The method of claim 7, further comprising: receiving a request for alocation of the fingerprint list from the client system; determining thelocation; and sending the location to the client system.
 9. The methodof claim 7, wherein the catalog is implemented as a single instance forthe cluster.
 10. The method of claim 1, wherein the data objectcomprises a container and a container reference, and the method furthercomprises: storing the container in a local deduplication pool at theassigned node, wherein the container comprises a deduplicated datastore, and a metadata store; and storing the container reference in alocal reference database at the assigned node, wherein the containerreference identifies the container.
 11. The method of claim 1, whereinthe determining whether the sub-data object is to be stored at theassigned node is based, at least in part, on at least one of acomputational resource of the assigned node, a storage resource of theassigned node, a network resource of the assigned node, or the sub-dataobject being a remote reference.
 12. A non-transitory computer-readablestorage medium, comprising program instructions, which, when executed byone or more processors of a computing system, perform a methodcomprising: receiving a data object from a client system at an assignednode, wherein the assigned node is a node of a plurality of nodes of acluster, the data object is being backed up as part of a backupoperation for the client system, the assigned node is assigned to thebackup operation and stores a catalog for use in the backup operation,the data object comprises a data segment, and a signature, and thesignature is generated based, at least in part, on data of the datasegment; determining whether the data object comprises a sub-dataobject, wherein the determining uses the catalog; and in response to adetermination that the data object comprises the sub-data object,processing the data object, wherein the backup operation comprises thedetermining and the processing, the assigned node performs thedetermining and the processing the data object, the data segment isstored in a first local deduplication pool at the assigned node, thesignature is stored in a first local metadata store at the assignednode, and the processing the data object comprises determining a remotenode at which the sub-data object is to be stored, generating areference that identifies the sub-data object and the remote node,storing the reference as a stored reference in a catalog at the assignednode, wherein storage of the stored reference in the catalog facilitatesaccess to the sub-data object at the remote node, and sending thesub-data object to the remote node, wherein the sending the sub-dataobject facilitates storage of  a data segment of the sub-data object ina second local deduplication pool at the remote node, and  a signatureof the sub-data object in a second local metadata store at the remotenode, and the remote node is another node of the plurality of nodes,other than the assigned node.
 13. The non-transitory computer-readablestorage medium of claim 12, wherein the data object comprises acontainer, the container comprises a container deduplicated data storeand a container metadata store, the container deduplicated data storecomprises the data segment, the signature is a fingerprint, thecontainer metadata store comprises metadata comprising the fingerprintand a location of the data segment in the container deduplicated datastore, and the fingerprint was generated by performing a hash functionon the data of the data segment.
 14. The non-transitorycomputer-readable storage medium of claim 12, wherein the catalog isimplemented as a single instance for the cluster.
 15. The non-transitorycomputer-readable storage medium of claim 12, wherein the data objectcomprises a container, and the sending the sub-data object to the remotenode comprises: sending the container to the remote node; and sending acontainer reference to the remote node, wherein the container referencecomprises a container identifier that identifies the container.
 16. Thenon-transitory computer-readable storage medium of claim 15, wherein thedata object comprises a container and the method further comprises:storing the container in a local deduplication pool at the assignednode, wherein the container comprises a deduplicated data store, and ametadata store; and storing the container reference in a local referencedatabase at the assigned node, wherein the container referenceidentifies the container.
 17. A computing system comprising: one or moreprocessors; and a computer-readable storage medium coupled to the one ormore processors, comprising program instructions, which, when executedby the one or more processors, perform a method comprising receiving adata object from a client system at an assigned node, wherein theassigned node is a node of a plurality of nodes of a cluster, the dataobject is being backed up as part of a backup operation for the clientsystem, the assigned node is assigned to the backup operation and storesa catalog for use in the backup operation, the data object comprises adata segment, and a signature, and the signature is generated based, atleast in part, on data of the data segment, determining whether the dataobject comprises a sub-data object, wherein the determining uses thecatalog, and in response to a determination that the data objectcomprises the sub-data object, processing the data object, wherein thebackup operation comprises the determining and the processing, theassigned node performs the determining and the processing the dataobject, the data segment is stored in a first local deduplication poolat the assigned node, the signature is stored in a first local metadatastore at the assigned node, and the processing the data object comprisesgenerating a reference that identifies the sub-data object, storing thereference in a local reference database at the assigned node,determining a remote node at which the sub-data object is to be stored,and sending the sub-data object to the remote node, wherein  a datasegment of the sub-data object is stored in a second local deduplicationpool at the remote node,  a signature of the sub-data object is storedin a second local metadata store at the remote node, and  the remotenode is another node of the plurality of nodes, other than the assignednode, the processing the data object comprises determining a remote nodeat which the sub-data object is to be stored, generating a referencethat identifies the sub-data object and the remote node, storing thereference as a stored reference in a catalog at the assigned node,wherein storage of the stored reference in the catalog facilitatesaccess to the sub-data object at the remote node, and sending thesub-data object to the remote node, wherein the sending the sub-dataobject facilitates storage of  a data segment of the sub-data object ina second local deduplication pool at the remote node, and  a signatureof the sub-data object in a second local metadata store at the remotenode, and the remote node is another node of the plurality of nodes,other than the assigned node.
 18. The computing system of claim 17,wherein the data object comprises a container, the container comprises acontainer deduplicated data store and a container metadata store, thecontainer deduplicated data store comprises the data segment, thesignature is a fingerprint, the container metadata store comprisesmetadata comprising the fingerprint and a location of the data segmentin the container deduplicated data store, and the fingerprint wasgenerated by performing a hash function on the data of the data segment.19. The computing system of claim 17, wherein the catalog is implementedas a single instance for the cluster.
 20. The computing system of claim17, wherein the data object comprises a container, and the sending thesub-data object to the remote node comprises: sending the container tothe remote node; and sending a container reference to the remote node,wherein the container reference comprises a container identifier thatidentifies the container.
 21. The computing system of claim 20, whereinthe data object comprises a container, and the method further comprises:storing the container in a local deduplication pool at the assignednode, wherein the container comprises a deduplicated data store, and ametadata store; and storing the container reference in a local referencedatabase at the assigned node, wherein the container referenceidentifies the container.